GrowthPredict: A toolbox and tutorial-based primer for fitting and forecasting growth trajectories using phenomenological growth models

Simple dynamic modeling tools can help generate real-time short-term forecasts with quantified uncertainty of the trajectory of diverse growth processes unfolding in nature and society, including disease outbreaks. An easy-to-use and flexible toolbox for this purpose is lacking. This tutorial-based primer introduces and illustrates GrowthPredict, a user-friendly MATLAB toolbox for fitting and forecasting time-series trajectories using phenomenological dynamic growth models based on ordinary differential equations. This toolbox is accessible to a broad audience, including students training in mathematical biology, applied statistics, and infectious disease modeling, as well as researchers and policymakers who need to conduct short-term forecasts in real-time. The models included in the toolbox capture exponential and sub-exponential growth patterns that typically follow a rising pattern followed by a decline phase, a common feature of contagion processes. Models include the 1-parameter exponential growth model and the 2-parameter generalized-growth model, which have proven useful in characterizing and forecasting the ascending phase of epidemic outbreaks. It also includes the 2-parameter Gompertz model, the 3-parameter generalized logistic-growth model, and the 3-parameter Richards model, which have demonstrated competitive performance in forecasting single peak outbreaks. We provide detailed guidance on forecasting time-series trajectories and available software (https://github.com/gchowell/forecasting_growthmodels), including the full uncertainty distribution derived through parametric bootstrapping, which is needed to construct prediction intervals and evaluate their accuracy. Functions are available to assess forecasting performance across different models, estimation methods, error structures in the data, and forecasting horizons. The toolbox also includes functions to quantify forecasting performance using metrics that evaluate point and distributional forecasts, including the weighted interval score. This tutorial and toolbox can be broadly applied to characterizing and forecasting time-series data using simple phenomenological growth models. As a contagion process takes off, the tools presented in this tutorial can help create forecasts to guide policy regarding implementing control strategies and assess the impact of interventions. The toolbox functionality is demonstrated through various examples, including a tutorial video, and the examples use publicly available data on the monkeypox (mpox) epidemic in the USA.


Installing the toolbox
• Download the MATLAB code located in folder 'forecasting_growthmodels code' from the GitHub reposi- tory: https:// github.com/ gchow ell/ forec asting_ growt hmode ls • Create 'input' folder in your working directory where your input data will be stored.
• Create 'output' folder in your working directory where the output files will be stored.
• Open a MATLAB session.

Overview of the toolbox functions
Table 1 lists the names of user functions associated with the toolbox, along with a brief description of their role.The internal functions associated with the toolbox are given in supplementary file 1 (Table 1S).The user needs

Parameter estimation method
Let y t 1 , y t 2 , . . ., y t n d denote the time series of the epidemic trajectory used in calibrating the model.Here, t j , j = 1, 2, . . ., n d , are the time points for the time series data, and n d is the number of observations.Let f (t, �) denote the expected curve of the epidemic trajectory.We can estimate the set of model parameters, denoted by , by fitting the model solution to the observed data via nonlinear least squares (LSQ) 23 ; within the MATLAB toolbox, this is realized by setting the parameter <method1> to 0 in the options_fit.mor options_ forecast.mfiles.Least squares estimation is achieved by searching for the set of parameters, , that minimizes the sum of squared differences between the observed data y t 1 , y t 2 . . ...y t n d and the best fit of the model (model mean) which corresponds to f (t, �) .That is, is estimated by � = argmin n d j=1 (f t j , � − y t j ) 2 .In the following section, we describe different phenomenological growth models included in this toolbox for the expected epidemic trajectory curve f (t, �).This parameter estimation method (LSQ) weights each data point equally and does not require a specific distributional assumption for y t , except for the first moment E[y t ] = f (t; �) .That is, the mean of the observed data at time t equals the expected count denoted by f (t, �) at time t 24 .Under mild regularity conditions, this method yields asymptotically unbiased point estimates regardless of any misspecification of the variance-covariance error structure.Hence, the estimated model f (t, �) best fits the mean of the observed data in terms of the L2 norm.We can use the fmincon function in MATLAB to set the optimization problem.Finally, we also employ MATLAB's MultiStart feature to specify the number of random initial guesses of the model parameters using the parameter <numstartpoints> in the options_fit.mor options_forecast.mfiles to thoroughly search for the best-fit parameter estimates.
In addition to nonlinear least squares fitting, we can also estimate model parameters via maximum likelihood estimation (MLE) 25 with specific assumptions about the error structure in the data (e.g., Poisson, Negative binomial) via parameter <method1> .The log-likelihood expressions derived for Poisson and negative binomial error structures are given below.

(a) Poisson
For a Poisson error structure, the full log-likelihood of Poisson (<method1>=1) is given by: Table 1.Description of the user functions associated with the toolbox.

options_fit.m
Specifies the parameters related to model fitting, including the characteristics of the time series data, the model, the parameter estimation method, the error structure, and the calibration period.The structure of the options_fit.mfile is described in Text 1S (supplementary file 1)

options_forecast.m
Specifies the parameters related to model forecasting, including the characteristics of the time series data, the model, the parameter estimation method, the error structure, the calibration period, and forecasting period.The structure of the options_forecast.mfile is described in Text 2S (supplementary file 1)

options_Rt.m
Specifies the parameters related to the estimation of the effective reproduction number, namely the type of generation interval distribution and the mean and variance of the generation interval for the infectious disease of interest plotGrowthModel.mPlots model solutions where the user provides the type of growth model, parameter values, and the initial condition

Run_Fit_GrowthModels.m Fits a model to data with quantified uncertainty
Run_Forecasting_GrowthModels.m Fits a model to data with quantified uncertainty and generates a model-based forecast with quantified uncertainty plotFit_GrowthModels Displays the model fit and the empirical distribution of the estimated parameters.It also saves output .csvfiles in the output folder with the model fit, the parameter estimates with 95% confidence intervals (CIs), the calibration performance metrics, and the doubling times estimated from the incidence trajectory plotForecast_GrowthModels Displays the model-based forecast and the associated performance metrics.Moreover, the data associated with the forecasts, the parameter estimates, the doubling times estimated from the incidence trajectory, and the calibration and forecasting performance metrics, are saved as .csvfiles in the output folder plotFit_ReproductionNumber Displays the model fit and the effective reproduction number.It also saves .csvfiles in the output folder with the model fit, the parameter estimates with 95% CIs, the calibration performance metrics, and the doubling times estimated from the incidence trajectory plotForecast_ReproductionNumber Displays the model-based forecast and the corresponding effective reproduction number.Moreover, the data associated with the forecasts, the parameter estimates, the doubling times estimated from the incidence trajectory, and the calibration and forecasting performance metrics, are saved as .csvfiles in the output folder where µ i = f (t i , �) denotes the mean of y i at time t i and f (t i , �) is the mean curve to be estimated from the differential equation.

(b) Negative binomial
Let r > 0 denote the number of failures until the experiment is stopped and p ∈ [0, 1] denote the success probability in each experiment.Then the number of successes, y, before the r-th failure occurs has a negative binomial distribution: www.nature.com/scientificreports/ We can express the likelihood with µ and σ 2 by substituting p = 1 − µ σ 2 and r = µ 2 σ 2 −µ , where µ = f (t, �) is the mean curve to be estimated from the differential equation.
There are different types of variances commonly used in a negative binomial distribution.If the variance scales linearly with the mean, then σ 2 = µ + αµ,(i.e., <method1>=3 in options_fit.mor options_ forecast.m),p = α 1+α and r = µ/α .The full log-likelihood (1.1) can be expressed as follows: If the variance scales quadratically with the mean, then σ 2 = µ + αµ 2 (ie., <method1>=4 in options_ fit.m or options_forecast.m),p = αµ 1+αµ and r = 1/α.The full log-likelihood (1.1) can be expressed as follows: The more general form of variance is σ 2 = µ + αµ d (i.e., <method1>=5 in options_fit.mor options_forecast.m)with any −∞ < d < ∞ .Then the full log-likelihood (1.1) can be expressed as follows: where Finally, it is worth noting that the above full log-likelihood expressions allow the selection or comparison of models based on different error structures via their AIC c (corrected Akaike Information Criterion) values.However, if the goal is to focus on different models with the same type of error structure (e.g., normal), we could use simplified likelihood expressions by removing the constants to speed up running time.

Parametric bootstrapping
To quantify parameter uncertainty, we follow a parametric bootstrapping approach, which allows the computation of standard errors and related statistics without closed-form formulas 26 .We generate B bootstrap samples from the best-fit model f (t, �) , with an assumed error structure specified using parameter <dist1> in the options_fit.mor options_forecast.mfiles, to quantify the uncertainty of the parameter estimates and construct confidence intervals.The bootstrapping algorithm is given in ref 6 .Typically, the error structure in the data is modeled using a probability model such as the normal, Poisson or negative binomial distribution.Using nonlinear least squares (<method1>=0), in addition to a normally distributed error structure (<dist1>=0), we can also assume a Poisson (<dist1>=1) or a negative binomial distribution (<dist1>=2), whereby the variance-to-mean ratio is empirically estimated from the time series.To estimate this constant ratio, we group a fixed number of observations (e.g., 7 observations for daily data into a bin across time), calculate the mean and variance for each bin, and then estimate a constant variance-to-mean ratio by calculating the average of the variance-to-mean ratios over these bins.For non-normal error structure, we estimate parameters using maximum likelihood estimation (MLE) assuming Poisson or negative binomial error structures in the data (<method1>=1 & <dist1>=1 for Poisson and <method1>=3, <method1>=4 & <dist1>=4, and <method1>=5 & <dist1>=5 for the different negative binomial error structures described above).
Specifically, using the best-fit model f (t, �) , we generate B-times replicated simulated datasets of size n d , where the observation at time t j is sampled from the corresponding distribution specified by <dist1>.Next, we refit the model to each of the B simulated bootstrap datasets and re-estimate the parameters using the same estimation method as for the original data.The new parameter estimates for each realization are denoted by b , where b = 1, 2, . . ., B. Using the B sets of re-estimated parameters b , we can characterize the empirical dis- tribution of each parameter estimate, calculate the variance, and construct confidence intervals for each parameter.The resulting uncertainty around the model fit can similarly be obtained from f t, 1 , f t, � 2 , . . ., f (t, � B ) .For the purposes of this tutorial, we characterize the uncertainty using 300 bootstrap realizations (i.e., parameter < B > = 300 in the options_fit.mor options_forecast.mfiles).

Model-based forecasts with quantified uncertainty
Based on the best-fit model f t, , we can forecast h days ahead based on the estimate f (t + h, �) .The uncer- tainty of the forecasted value can be obtained using the previously described parametric bootstrap method.Let (1.4) www.nature.com/scientificreports/denote the forecasted value of the current state of the system propagated by a horizon of h time units, where b denotes the estimation of parameter set from the b th bootstrap sample.We can use these values to calculate the bootstrap variance to measure the uncertainty of the forecasts and use the 2.5% and 97.5% percentiles to construct the 95% prediction intervals (95% PI).We can set the forecasting horizon (h) using the parameter <forecastingperiod1> in the options_forecast.mfile.
For illustration, we fit the models through LSQ with a normal error structure (i.e., <method1>=0 and <dist1>=0) for the monkeypox data.In the options_fit.mor options_forecast.mfiles, the values of the parameters related to the parameter estimation method and parametric bootstrapping follow: www.nature.com/scientificreports/

Phenomenological growth models
Below we describe the growth models included in the toolbox.We use C(t) to denote the cumulative case count at time t and C ′ (t) denotes the expected epidemic trajectory curve f (t, �).
(a) Generalized-growth model (GGM) Models commonly used to study the growth pattern of infectious disease outbreaks often assume exponential growth in the absence of control interventions (e.g., compartmental models); however, growth patterns are likely slower than exponential for some diseases, depending on the transmission mode and population structure.For example, Ebola spreads via close contact, therefore, sub-exponential growth patterns would be expected in a constrained population contact structure 27 .The generalized growth model (GGM) 21 includes a "deceleration of growth" parameter, p (range: [0, 1]), that relaxes the assumption of exponential growth.A value of p = 0 rep- resents constant (linear) growth, while a value of p = 1 indicates exponential growth.If 0 < p < 1 , the growth pattern is characterized as sub-exponential or polynomial.
The GGM is as follows: where the derivative C ′ (t) describes the incidence curve over time t.The positive parameter r is the growth rate parameter ( r > 0 ), and p is the scaling of growth parameter 21 .For this model, we estimate = r, p .This model can be selected by setting <flag1>=0 in the options_fit.mand options_forecast.mfiles.

(b) Generalized logistic growth model (GLM)
The generalized logistic growth model (GLM) has three parameters and is given by: The growth scaling parameter, p , is also used in the GLM to model a range of early epidemic growth profiles ranging from constant incidence p = 0 , polynomial 0 < p < 1, and exponential growth dynamics (p = 1) .When p = 1 , this model reduces to the logistic growth model (< flag1 > = 3).The remaining model parameters are as follows: r is the growth rate, and K 0 is the final cumulative epidemic size.For this model, � = (r, p, K 0 ) .The GLM has been employed to generate short-term forecasts of Zika, Ebola, and COVID-19 epidemics 10,14,15,28 .This model can be selected by setting <flag1>=1 in the options_fit.mand options_forecast.mfiles.

(c) Richards model (RIC)
The well-known Richards model extends the simple logistic growth model and relies on three parameters.It extends the simple logistic growth model by incorporating a scaling parameter, a, that measures the deviation from the symmetric simple logistic growth curve 6,29,30 .The Richards model is given by the differential equation: where r is the growth rate, a is a scaling parameter and K 0 is the final epidemic size.The Richards model has been employed to generate short-term forecasts of SARS, Zika, Ebola, and COVID-19 epidemics 10,11,14,15,28 .For this model, we estimate � = (r, a, K 0 ) .This model can be selected by setting <flag1>=4 in the options_fit.mand options_forecast.mfiles.A 4-parameter extension of the Richards model is the generalized Richards model (<flag1>=2), which incorporates the growth scaling parameter p used in the GGM and GLM.

(d) Gompertz model (GOM)
The 2-parameter Gompertz model is given by: where r is the growth rate and b > 0 describes the exponential decline of the growth rate.For this model, we estimate � = (r, b).The GOM model has been employed to generate short-term forecasts of Zika and COVID-19 epidemics 17,31,32 .This model can be selected by setting <flag1>=5 in the options_fit.mand options_forecast.mfiles.

Initial condition
Besides the parameters of the dynamic growth models, it is also possible to estimate the initial number of cases in the time series, rather than fixing C(0) according to the first data point in the time series by specifying the Boolean variable <fxI0> in the options_fit.mand options_forecast.mfiles.Specifically, <fixI0>=1 www.nature.com/scientificreports/fixes the initial condition according to the first data point in the time series, whereas <fixI0>=0 estimates the initial condition along the other model parameters.

Quality of model fit
To assess the quality of model fit, we can compare the AIC c (corrected Akaike Information Criterion) values of the best-fit models.The AIC c is given by 33,34 : where m is the number of model parameters and n d is the number of data points.Specifically for normal distri- bution, the AIC c is where . Thus, this metric accounts for model complexity regarding the number of model parameters and is used for model selection.In the options_fit.mand options_forecast.mfiles, the values of the parameters related to the selection of the growth model follow: % 3 = LM RICH=4; % 4 = Richards GOM=5; % 5 = Gompertz flag1=GLM; % Growth model considered in the epidemic trajectory model_name1='GLM'; % A string provided by the user for the name of the model fixI0=1; % 0=Estimate the initial number of cases; 1 = Fix the initial number of cases according to the first data point in the time series

Plotting simulations of the growth model
Before fitting the growth model to the data, it is helpful to check that the selected model yields simulations broadly consistent with the range of the time series data by generating model simulations with different parameter values.For example, if data show systematic differences that contrast with the model solutions (e.g., multiple peaks or plateaus), it may suggest that the model is not the best choice for the data at hand.
The function plotGrowthModel.m can be used to plot model solutions where the user provides the type of growth model, parameter values, and initial condition as passing input parameters to the function in the following order: <flag1>, r , p , a, K, C(0), and finally the duration of the simulation (i.e., the time span over which the model is numerically solved).For example, the following call plots a simulation of the GLM model (<flag1>=1) with the following parameter values: r = 0.18, p = 0.9, and K = 1000.The initial condition is indicated by C(0) = 1 , and the total duration of the simulation is set at 200. > > plotGrowthModel(1,0.18,0.9,[],10000,1,200)Of note, in the above call, the value of parameter a is passed empty ([]) since the GLM model does not use this parameter.This function will generate a figure (Fig. 2A) that shows the corresponding model solution, dC(t)/dt .Additional representative simulations from other growth models supported in the toolbox are shown in Fig. 2 Model and code testing Before fitting the model to the data, it is helpful to check that the code works properly with simulated data before applying the toolbox to actual data.If we can estimate the true parameter values used to generate simulated data from the model, we can confidently move forward that the code is working appropriately with simulated data.For this purpose, we used the following selected values r = 0.16, p = 0.94 and K = 10000, to generate a dataset from the GLM with Poisson error structure, initial condition C(0) = 1 , and duration of 120 days.
We then proceeded to estimate the three model parameters for the simulated data using maximum likelihood (i.e., <method1>=1, <dist1>=1).The corresponding options_fit.mfile is given in supplementary file 1 (Code File 1S). Figure 3 shows the GLM fit and the empirical distribution of the estimated parameters for the simulated data.These findings indicate that the parameter estimates align with the "true" parameter values used to simulate the data.Moreover, the model is well calibrated to the data with the 95% prediction interval coverage falling at 95%.

Performance metrics
To assess calibration and forecasting performance, we used four performance metrics: the mean absolute error (MAE), the mean squared error (MSE), the coverage of the 95% prediction intervals (95% PI), and the weighted interval score (WIS) 35 .While it is possible to generate h-time units ahead forecasts of an evolving process, those forecasts looking into the future can only be evaluated when sufficient data for the h-time units ahead has been collected.In the options_forecast.mfile, the parameter <getperformance> is a Boolean variable (0/1) to indicate whether the user wishes to compute the performance metrics of the forecasts when sufficient data is available.
The mean absolute error (MAE) is given by: where t h are the time points of the time series data 36 , and N is the calibration or forecasting period length.Simi- larly, the mean squared error (MSE) is given by: The coverage of the 95% prediction interval (95% PI) corresponds to the fraction of data points that fall within the 95% PI, and is calculated as where L t and U t are the lower and upper bounds of the 95% PIs, respectively, Y t are the data and 1 is an indicator variable that equals 1 if Y t is in the specified interval and 0 otherwise.
The weighted interval score (WIS) 35,37 , which is a proper score recently embraced for quantifying model forecasting performance in epidemic forecasting studies [38][39][40][41][42] , provides quantiles of predictive forecast distribution by combining a set of Interval Score (IS) for probabilistic forecasts.An IS is a simple proper score that requires only a central (1 − α) * 100% PI 35 and is described as In this Eq. 1 refers to the indicator function, meaning that 1 y < l = 1 if y < l and 0 otherwise.The terms l and u represent the α 2 and 1 − α 2 quantiles of the forecast F. The IS consists of three distinct quantities: 1.The sharpness of F, given by the width u − l of the central (1 − α) × 100% PI. 2. A penalty term 2 α × l − y × 1 y < l for the observations that fall below the lower end point l of the (1 − α) × 100% PI.This penalty term is directly proportional to the distance between y and the lower end l of the PI.The strength of the penalty depends on the level α.
3. An analogous penalty term 2 α × y − u × 1 y > u for the observations falling above the upper limit u of the PI.To provide more detailed and accurate information on the entire predictive distribution, we report several central PIs at different levels along with the predictive median, y , which can be seen as a central prediction interval at level 1 − α 0 → 0 .This is referred to as the WIS, and it can be evaluated as follows: where, w k = α k 2 for k = 1, 2, . . ..K and w 0 = 1 2 .Hence, WIS can be interpreted as a measure of how close the entire distribution is to the observation in units on the scale of the observed data 39,43 .

Doubling times
Doubling times characterize the sequence of times at which the cumulative incidence doubles.Denote the times at which cumulative incidence doubles by t d j , such that 2C(t d j ) = C(t d j+1 ) where t d 0 = 0, C t d 0 = C 0 , j = 1, 2, 3, . . ., n g and n g is the total number of times cumulative incidence doubles 44,45 .The actual sequence of "doubling times" is defined as follows: For exponential growth, doubling times remain invariant and are given by (ln2)/r , whereas the doubling times increase when the growth pattern follows sub-exponential growth 44 .We can characterize the doubling times and their uncertainty from the best-fit model f t, 46 .We can evaluate the uncertainty of the sequence of doubling times and the overall doubling time using the model parameter estimates derived from bootstrapping b , where b = 1, 2, 3, . . ., B .That is, d j ( � b ) provides a sequence of doubling times for a set of bootstrap parameter estimates, b , where b = 1, 2, 3, . . ., B .We can use these curves to derive 95% CIs for the sequence of doubling times and quantify the probability of observing a given number of doublings.

The effective reproduction number, R t
While the basic reproduction number, commonly denoted by R 0 , gauges the transmission potential at the onset of an epidemic 47 , the effective reproduction number, R t , captures changes in transmission potential throughout the epidemic 48,49 .We can characterize the effective reproduction number and its uncertainty from the best-fit model f t, 46 .We can derive the 95% CI of R t from the uncertainty associated with the parameter estimates derived from bootstrapping ( � b ) where b = 1, 2, 3, . . ., B .That is, R t ( � b ) provides a curve of the effective repro- duction number for a set of parameter values b where b = 1, 2, 3, . . ., B .Denote the incidence at calendar time t j by I(t j , � b ) , and the discretized probability distribution of the generation interval by ρ t j .The effective reproduc- tion number R t ( � b ) can be estimated using the renewal equation 48,49 : where the denominator represents the total number of cases that contribute (as primary cases) to generate the number of new cases I t j (as secondary cases) at calendar time t j 49 .In the options_Rt.m file, we can specify the parameters related to the estimation of the effective reproduction number, namely the type of generation interval distribution (type_GId1), the mean of the generation interval (mean_GI1) and the variance of the generation interval (var_GI1) of the infectious disease of interest.For monkeypox, we employ a gamma distributed generation interval with a mean of 1.78 weeks and variance of 0.65 weeks 250 as shown below:

Results and discussion
The input dataset For this toolbox, the time series data will be stored in the 'input' folder and needs to be a text file with the extension *.txt.The first column should correspond to the time index: 0,1,2,3, …, and the second column corresponds to the temporal incidence data.If the time series file contains cumulative incidence count data, the name of the time series data file must start with "cumulative".
To illustrate the methodology presented in this tutorial paper, we employ the weekly incidence curve of monkeypox cases reported in the USA from the publicly available data published by the Centers for Disease Control and Prevention (CDC) from the week of 12 May 2022 through the week of 15 December 2022 51 .The data file is pre-loaded in the input folder within the toolbox's working directory (data file path: ./input/Most_ Recent_Timeseries_US-CDC.txt).A snapshot in Excel of the contents of the file is shown in Fig. 4.
In the options_fit.mand options_forecast.mfiles, the user specifies the parameters related to model fitting and forecasting, respectively, as shown below.For instance, the parameter < cadfilename1> is a string used to indicate the name of the data file, parameter <caddisease> is a string used to indicate the name of the disease related to the time series data, and < datatype> is a string parameter indicating the nature of the data (e.g., cases, deaths, hospitalizations).

Fitting the models to data with quantified uncertainty
The function Run_Fit_GrowthModels.m can be used to fit one of the phenomenological growth models to data with quantified uncertainty.The function uses the input parameters specified by the user in the options_fit.mfile.However, the function may receive the parameters related to the rolling window analysis (<tstart1>, <tend1>, and <windowsize1>) as passing input parameters with the remaining inputs accessed from the options_fit.mfile.
For example, we can fit the generalized logistic growth model (<flag1>=1 in options_fit.mfile) to the weekly incidence curve of monkeypox in the USA pre-loaded in the input folder (data file path: ./input/Most_Recent_Timeseries_US-CDC.txt).We assume a normal error structure (i.e., <dist1>=0 in the options_fit.mfile) and examine the fit of the model to the data.Since the monkeypox epidemic curve comprises 32 weeks of data, we can pass the rolling window parameters to the function call in MATLAB as follows: >> Run_Fit_GrowthModels(1,1,32).
In the above call to the function, <tstart1>=1, <tend1>=1 , and <windowsize>=32.Hence, this function will generate a single model fit and store several output MATLAB files related to the model fit, parameter estimates, and the quality of model fit in the output folder.For each model fit, it will also generate a figure with the model fit and the corresponding empirical distributions of the estimated parameters (Fig. 5).
This function also plots the temporal sequence of parameter estimates and their uncertainty obtained from the rolling-window analysis whenever the value of the parameter <tend1> is greater than parameter <tstart1>.For instance, after running the function Run_Fit_GrowthModels(1,3,30) to generate a rolling window analysis of model fits to capture the outbreak's trajectory, we employed a window size of 30 with start times at 1 (end time: 30), 2 (end time: 31), and 3 (end time: 32), we can run the function plotFit_GrowthModels(1,3,30) from MATLAB's command line to generate the rolling window analysis plot (Fig. 6).
We can also assess the fit of the Richards model to the monkeypox incidence curve (<flag1>=4 in options_fit.mfile) and compare the quality of the model fit to that obtained using the generalized logistic growth model using the performance metrics.Figure 7 shows the fit of the Richards model to the epidemic curve and the empirical distribution of the parameters.
The calibration performance metrics of the generalized logistic growth model and the Richards model (Table 2) indicate that the Richards model yields a better fit to the data in terms of the MAE, MSE, and WIS while both models achieved a 100% coverage of the 95% PI.

Plotting and assessing model-based forecasts
To generate a forecast, we can use the function Run_Forecasting_GrowthModels.m.This function uses the input parameters provided by the user in the options_forecast.mfile.However, the function can also receive <tstart1>, <tend1>, <windowsize1>, and <forecastingperiod> as passing input parameters with the remaining input parameters accessed from the options_forecast.mfile.
For example, we can fit the generalized logistic growth model (<flag1> =1 in options_forecast.m file) to the first 10 weeks of the monkeypox epidemic in the USA assuming a normal error structure (i.e., <dist1> = 0 in options_forecast.m file) and generate a 4-week ahead prediction by running the function from MAT-LAB's command line as follows: >> Run_Forecasting_GrowthModels(1,1,10,4) This function will generate a single model fit and store several output MATLAB files related to the model fit and forecast, parameter estimates, and the calibration and forecasting performance metrics.It will also generate a figure with the model fit and 4-week ahead forecast and the corresponding empirical distributions of the parameters (Fig. 8).Overall, the 4-week ahead forecast shown in Fig. 8 underpredicted the trajectory of the epidemic.Figure 1S (supplementary file 1) illustrates the results of a timing study related to the employed modeling methodology utilized in Fig. 8.For comparison, Fig. 9 also shows the model fit and 4-week ahead forecast when the model is calibrated using the first 12 weeks of the epidemic curve instead of the first 10 weeks.The 4-week ahead forecast calibrated on the first 12 weeks of data forecasted better the epidemic curve than the one calibrated on the first 10 weeks of data.
Once the user has executed the function Run_Forecasting_GrowthModels, the function plot-Forecast_GrowthModels can be used to plot the model-based forecast and the performance metrics of the forecast (MSE, MAE, 95% PI, WIS) based on the inputs indicated in the options_forecast.mfile.However, this function can also receive <tstart1>, <tend1>, < windowsize1>, and <forecastingperiod>   as passing input parameters while the remaining inputs are retrieved from the options_forecast.mfile.Moreover, the data associated with the forecasts, the parameter estimates, the Monte Carlo standard errors (MCSE) of the parameter estimates, the AIC c values, the calibration and forecasting performance metrics, and the doubling times of the entire trajectory, including the forecasting period, are saved as .csvfiles in the output folder.For example, the following line illustrates the execution of the function from MATLAB's command window: >> plotForecast_GrowthModels(1,1,10,4) This function plots the model fit based on the first 10-weeks of data as the calibration period and produces a 4-week ahead forecast and the empirical distribution of the estimated parameters (Fig. 8).It also displays the associated forecasting performance metrics (Fig. 10).
Similarly, we can generate the forecast using the Richards model by specifying <flag1>=4 in the options_ forecast.mfile and compare the forecasting performance of this model with that obtained employing the generalized logistic growth model using the performance metrics.Figure 11 shows the corresponding forecast of the Richards model based on the first 10 weeks of the epidemic curve and the resulting empirical distribution of the parameters.
The forecasting performance metrics of the generalized logistic growth model and the Richards model based on the first 10 weeks of the epidemic curve (Table 3) indicate that the generalized logistic growth model outperformed the Richards model in terms of forecasting performance.

Plotting the effective reproduction number, R t
Once the Run_Fit_GrowthModels.m has been executed, the user can also run the function plotFit_ ReproductionNumber.m to display the effective reproduction number, R t based on the inputs indicated in the options_fit.mand options_Rt.m files (Fig. 12).It also saves .csvfiles in the output folder with the effective reproduction number, the model fit, the parameter estimates, including 95% CIs, the Monte Carlo standard errors (MCSE) of the parameter estimates, the AIC c values, the calibration performance metrics, and the estimated doubling times of the incidence trajectory.The call to the function from MATLAB's command line follows: >> plotFit_ReproductionNumber(1,1,32) The function uses the same rolling window parameters employed in the Run_Fit_GrowthModels.m function call.This function will store the following .csvfiles in the output folder: The model fit is consistent with early exponential growth dynamics (i.e., p ~ 1), and has an estimated growth rate ( r) between 0.73 and 0.94.The epidemic size parameter, K, was estimated to be between 120,000 and 170,000 cases, and the scaling parameter, a , is not estimated when employing a GLM.The horizontal dashed lines in the top panels show the range of the 95% CIs of the parameter estimates.In the bottom panel, the solid red line is the median model fit, and the dashed lines correspond to the 95% PIs.The blue dots indicate the observed data points.The gray lines correspond to the mean of the model fits obtained from the parametric bootstrapping with 300 bootstrap realizations, and the cyan lines indicate the predictive uncertainty around the model fit.The vertical dashed line separates the 10-week calibration (left) and the 4-week ahead forecasting period (right).Overall, the 4-week ahead forecast underpredicted the trajectory of the epidemic.times, and the effective reproduction number are saved as .csvfiles in the output folder.For example, the following line illustrates the execution of the function from MATLAB's command window: >> plotForecast_ReproductionNumber(1,1,10,4) This function plots the model fit based on a 10-week calibration period, a 4-week ahead forecast, and the corresponding effective reproduction number (Fig. 13).

Conclusion
In this tutorial-based primer, we have introduced the first toolbox, which will be broadly applicable to fit and forecast time-series trajectories using phenomenological dynamic growth models.In particular, the models included in the toolbox have been frequently applied to characterize and forecast epidemic trajectories in near real-time 10,11,14,15,18,21,28 .The toolbox can be used as part of the curriculum of student training in mathematical biology, applied statistics, infectious disease modeling, and specialty courses in epidemic modeling and

Figure 1 .
Figure 1.Overview of the workflow for fitting and forecasting time series trajectories described in this tutorial.

Figure 2 .
Figure 2.Representative simulations from various growth models, namely (A) <flag1>=1, (B) <flag1>=2, (C) <flag1>=3, and (D) <flag1>=4, supported in the toolbox using parameter values: r = 0.18, p = 0.9, a = 0.6, K = 10000.Here, r refers to the growth parameter, p is a growth scaling parameter (GLM and GGM models), and a is a scaling parameter employed with the Richards model.K refers to the size of the epidemic.The initial condition is C(0) = 1 , and the total duration of the simulation is set at 200.

Figure 3 .
Figure 3.The generalized logistic-growth model (GLM) fit, and the empirical distribution of the estimated parameters for the simulated data.These findings indicate that the parameter estimates of r (growth rate), p (growth scaler) and K (epidemic size) are in line with the "true" parameter values used to simulate the data.The scaling parameter, a , is not estimated when employing the GLM.In the bottom panel, the solid red line is the median model fit and the dashed lines correspond to the 95% prediction intervals.The blue dots indicate the observed data points.The gray lines correspond to the mean of the model fits obtained from the parametric bootstrapping with 300 bootstrap realizations, and the cyan lines indicate the predictive uncertainty around the model fit.

Figure 4 .
Figure 4.A screenshot of the weekly incidence curve of monkeypox (mpox) cases reported in the USA published by the Centers for Disease Control and Prevention (CDC) for the week of 12 May 2022 through the week of 15 December 202251 .The file is in the input folder within the working directory (data file path:./input/Most_Recent_Timeseries_US-CDC.txt).The first column of the file corresponds to the time index, and the second column corresponds to the weekly incidence curve of monkeypox cases.

Figure 5 .
Figure 5. Fitting the generalized logistic-growth model (GLM) to the entire incidence curve of the monkeypox epidemic in the USA for the week of 12 May 2022 through the week of 15 December 2022.The model provides a good fit to the entire incidence curve.The model supports sub-exponential growth dynamics (i.e., p ~ 0.8-0.9),and has a growth rate ( r) estimated between 1.6 and 2.3.The epidemic size parameter, K, was estimated at ~ 28,000-30,000 cases, and the scaling parameter, a , is not estimated when employing the GLM.The horizontal dashed lines in the top panels show the range of the 95% CIs of the parameter estimates.In the bottom panel, the solid red line is the median model fit, and the dashed lines correspond to the 95% PIs.The blue dots indicate the observed data points.The gray lines correspond to the mean of the model fits obtained from the parametric bootstrapping with 300 bootstrap realizations, and the cyan lines indicate the predictive uncertainty around the model fit. https://doi.org/10.1038/s41598-024-51852-8

Figure 6 .
Figure 6.Results of the rolling window analysis after running the function plotFit_ GrowthModels(1,3,30).The top panel shows the monkeypox epidemic curve in the USA for reference.The bottom panels show the temporal sequence of parameter estimates (-o-, red line), and their 95% CIs (blue dashed lines) for three different moving time windows (1-30, 2-31, and 3-32).

Figure 7 .
Figure 7.The fit of the Richards model to the entire incidence curve of the monkeypox epidemic in the USA for the week of 12 May 2022 through the week of 15 December 2022.The model provides a good fit to the entire incidence curve.The epidemic size parameter, K, was estimated at ~ 28,000-30,000 cases, similar to that estimated using the generalized logistic-growth model.The growth rate, r , was estimated to be between 0.88 and 0.98, and the scaling parameter, a , between 0.32 and 0.4.The growth scaling parameter, p , is not estimated for the Richards model.The horizontal dashed lines in the top panels show the range of the 95% CIs of the parameter estimates.In the bottom panel, the solid red line is the median model fit, and the dashed lines correspond to the 95% PIs.The blue dots indicate the observed data points.The gray lines correspond to the mean of the model fits obtained from the parametric bootstrapping with 300 bootstrap realizations, and the cyan lines indicate the predictive uncertainty around the model fit.

Figure 8 .
Figure 8.The generalized logistic-growth model (GLM) fit, and 4-week ahead forecast based on the first 10 weeks of the monkeypox epidemic in the USA for the week of 12 May 2022 through the week of 14 July 2022.The model fit is consistent with early exponential growth dynamics (i.e., p ~ 1), and has an estimated growth rate ( r) between 0.73 and 0.94.The epidemic size parameter, K, was estimated to be between 120,000 and 170,000 cases, and the scaling parameter, a , is not estimated when employing a GLM.The horizontal dashed lines in the top panels show the range of the 95% CIs of the parameter estimates.In the bottom panel, the solid red line is the median model fit, and the dashed lines correspond to the 95% PIs.The blue dots indicate the observed data points.The gray lines correspond to the mean of the model fits obtained from the parametric bootstrapping with 300 bootstrap realizations, and the cyan lines indicate the predictive uncertainty around the model fit.The vertical dashed line separates the 10-week calibration (left) and the 4-week ahead forecasting period (right).Overall, the 4-week ahead forecast underpredicted the trajectory of the epidemic.

Figure 10 .
Figure 10.Forecasting performance metrics (MAE, MSE, 95% PI coverage, WIS) associated with the 4-week ahead forecasts obtained from fitting the generalized logistic growth model to the first 10 weeks of the monkeypox epidemic in the USA.

Figure 11 .
Figure 11.The Richards model fit, and 4-week ahead forecast based on the first 10 weeks of the monkeypox epidemic in the USA for the week of 12 May 2022 through the week of 14 July 2022.The model fit estimates the epidemic size at K ~ 8300-11,000 cases, and the growth rate, r , was estimated to be between 0.71 and 0.73.The scaling parameter, a , fell between 1.2 and 2.1.The growth scaling parameter, p , is not estimated for the Richards model.The horizontal dashed lines in the top panels show the range of the 95% CIs of the parameter estimates.In the bottom panel, the solid red line is the median model fit, and the dashed lines correspond to the 95% PIs.The blue dots indicate the observed data points.The gray lines (closely aligned with the median model fit) correspond to the mean of the model fits obtained from the parametric bootstrapping with 300 bootstrap realizations, and the cyan lines indicate the predictive uncertainty around the model fit.The vertical dashed line separates the 10-week calibration period (left) and the 4-week ahead forecast period (right).Overall, the 4-week ahead forecast underpredicted the trajectory of the epidemic.

Table 3 .
Calibration and forecasting performance metrics obtained from the fits of the generalized logistic growth model and the Richards model based on the first 10 weeks of the monkeypox epidemic in the USA forecasting out 4-weeks.The metrics indicate that the Richards model yields a better fit to the data (better calibration performance) than the generalized logistic growth model in terms of the MAE, MSE, and WIS.However, the generalized logistic growth model outperformed the Richards model in terms of forecasting performance.

Figure 12 .
Figure 12.The top panel displays the fit of the generalized logistic-growth model to the entire incidence curve of the monkeypox epidemic in the USA for the week of 12 May 2022 through the week of 15 December 2022.In the top panel, the solid red line is the median model fit, and the dashed lines correspond to the 95% PIs.The blue dots indicate the observed data points.The gray lines correspond to the mean of the model fits obtained from the parametric bootstrapping with 300 bootstrap realizations, and the cyan lines indicate the predictive uncertainty around the model fit.The bottom panel shows the corresponding effective reproduction number, R t , assuming a gamma distribution for the generation interval with a gamma distributed generation interval with a mean of 1.78 weeks and variance of 0.65 weeks 250 .The solid red line corresponds to the mean effective reproduction number, while the dashed lines correspond to the 95% confidence bounds around the mean.

Figure 13 .
Figure 13.The top panel displays the generalized logistic-growth model fit, and 4-week ahead forecast based on the first 10 weeks of the monkeypox (mpox) epidemic in the USA, the week of 12 May 2022 through the week of 14 July 2022.In the top panel, the solid red line is the median model fit, and the dashed lines correspond to the 95% PIs.The blue dots indicate the observed data points.The gray lines correspond to the mean of the model fits obtained from the parametric bootstrapping with 300 bootstrap realizations, and the cyan lines indicate the predictive uncertainty around the model fit.The vertical dashed line separates the 10-week calibration period (left) and the 4-week ahead forecast period (right).The bottom panel shows the corresponding effective reproduction number, R t , assuming a gamma distribution for the generation interval with a mean of 1.78 weeks and variance of 0.65 weeks 250 .The solid red line corresponds to the mean effective reproduction number, while the dashed lines correspond to the 95% confidence bounds around the mean.

Table 2 .
Calibration performance metrics for a 32-week calibration period quantifying how well the fits of the generalized logistic growth model and the Richards model captured the epidemic curve of monkeypox in the USA.The metrics indicate that the Richards model yields a better fit to the data in terms of the MAE, MSE, and WIS while both models achieved a 100% coverage of the 95% prediction interval.